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^— V , Abstract. We address the problem of finding the minimal number of 

^sj ■ block interchanges (exchange of two intervals) required to transform a 

, ' duplicated linear genome into a tandem duplicated linear genome. We 

'~{ , provide a formula for the distance as well as a polynomial time algorithm 

^~~i • for the sorting problem. 

' _'| 1 Introduction 

'^i [ Genomic rearrangements are known to play a central role in the evolutionary 

Y^ • history of the species. Several operations act on the genome, shaping the se- 

i_^| quence of genes. A number of models to sort a genome into another have been 

studied: reversals, transpositions and more recently Double-Cut-and-Join (DC J). 
►^ I Another operation, called block interchange, consists in exchanging two intervals 

vQ ' of a genome. 

[~^ I Block interchanges scenarios have been studied for the first time by Christie 

^^ ' [1]. He proposed a polynomial-time algorithm for computing the distance be- 

'""J I tween two linear chromosomes with unique gene content. Lin et al. [5] proposed 

t~^ ' later a better algorithm. Yancopoulos et al. [9] introduced the DCJ operation 

^D I which consist in cutting the genomes in two points and joining the four re- 

^~^ • suiting extremities in a different way. Interestingly, they noticed that a block 

. . I interchange can be simulated by two consecutive DCJs: an excision followed by 

^ ' a reintegration. 

Jv>( , Another very important feature in genome evolution is that genomes of- 

5^ ' ten undergo duplication events: both segmental and whole-genome duplications. 

C^ I Genome duplication events are followed by other rearrangements events which 

result in a scrambled genome. Genome halving consists in finding the sequence 
of events that allow to go back from the scrambled genome to the original du- 
plicated one. 

Genome halving has been studied under several models: reversals [2], translo- 
cation/reversals [3] , DCJ [8] , breakpoints [7] . Most of the results led to polyno- 
mial time algorithms. Particularly, under the DCJ model, Mixtacki [6] gave some 
useful results and data structures. In this paper, we derive our results from those 
results. Very recently, Kovac et al. [4] addressed the problem of reincorporat- 
ing the temporary circular chromosomes induced by DCJs immediately after 
their creation considering genome halving. Although this problem is obviously 
related to the problem we address, the aim and results are not the same. We 



are interested in linear genomes, not in multilinear ones, and we focus on pure 
block interchange scenarios whereas Kovac et al. focused on scenarios made of 
reversals, translocation, fusion, fissions along with block interchanges. 

Section 2 gives definitions. In Section 3, we first give a lower bound on the 
distance with helpful properties for the rest of the paper. In Section 4, we prove 
the analytical formula for the distance. We conclude in Section 5 with a quadratic 
time and space algorithm to obtain a parsimonious scenario. 

2 Preliminaries: duplicated genomes, rearrangement, 
genome halving problems 

In this section we give the main definitions and notations used in the paper. 

Duplicated Genomes 

A genome is composed of genomic markers organized in linear or circular chromo- 
somes. A linear chromosome is represented by an ordered sequence of unsigned 
integers, each standing for a marker, surrounded by two abstract markers o at 
each end indicating the telomeres. A circular chromosome is represented by a 
circularly ordered sequence of unsigned integers representing markers. For ex- 
ample, (1 2 3) (o 4 5 6 7 o) is a genome constituted of one circular and one 
linear chromosome. 

Definition 1. A rearranged duplicated genome is a genome in which each marker 
appears twice. 

In a rearranged duplicated genome, two copies of a same marker are called 
paralogs. We distinguish paralogs by denoting one marker by x and its paralog 
by X. By convention x — x. For example, the following genome is a rearranged 
duplicated genome: (olT3245667382459879o). 

An adjacency in a genome is a pair of consecutive markers. For example, 
the genome (o 1 2 o) (3 4 5) has six adjacencies, (o 1), (1 2), (2 o), 
and (3 4), (4 5), (5 3). The linear or circular order of the markers in a 
chromosome naturally induces an order on the adjacencies that we denote by 
<. For example in the previous genome the order induced on the adjacencies is: 
(o 1) < (1 2) < (2 o), and (3 4) < (4 5) < (5 3) < (3 4). 

A double- adjacency in a genome G is an adjacency (a 6) such that (a b) 
is an adjacency of G as well. Note that a genome always has an even number 
of double- adjacencies. For example, the four double-adjacencies in the following 
genome are indicated by dots : 



G=(ol 132-4-5667382-4-59879o) 

A consecutive sequence of double- adjacencies can be rewritten as a single 
marker; this process is called reduction. For example, genome G can be reduced 
by rewritting 2.4.5 and 2 . 4 . 5 as 10 and 10, yielding the following genome: 

G"' = (olT3 10 66738T09879o) 



Definition 2. A tandcm-duplicatcd genome is a rearranged duplicated genome 
which can be reduced to a genome of the form (o x x o). 

In other words, a tandem-duplicated genome is composed of a single lin- 
ear chromosome where all adjacencies, except the two containing the marker 
o and the central adjacency, are double-adjacencies. For example, the genome 
(o 1-2-3-4 l-2-3-4o)isa tandem-duplicated genome that can be 
reduced to (o 5 5 o) by rewritting 1 • 2 • 3 • 4 and 1 • 2 • 3 • 4 as 5 and 5. 

Definition 3. A perfectly duplicated genome is a rearranged duplicated genome 
such that each adjacency is a double- adjacency. 

For example, the genome (1234123 4) is a perfectly duplicated 
genome. 

Rearrangements 

A rearrangement operation on a given genome cuts a set of adjacencies of the 
genome called breakpoints and forms new adjacencies with the exposed extrem- 
ities, while altering no other adjacency. In the sequel, the adjacencies cut by a 
rearrangement operation are indicated in the genome by the symbol ^ . 

An interval in a genome is a set of markers that appear consecutively in the 
genome. Given two different adjacencies (a b) and (c d) in a genome A such 
that (a b) < (c d), [5 ; c] denotes the interval of A beginning with marker b 
and ending with marker c. 

In this paper, we consider two types of rearrangement operations called block 
interchange (BI) and double- cut- and- join (DCJ). 

A block interchange (BI) on a genome G is a rearrangement operation that 
acts on four adjacencies in G, {a b) < (c d) < (u v) < (x y) such that 
the intervals [b ; c] and [v ; x] do not overlap, swapping the intervals [b ; c] 
and [v ; x]. For example, the following block interchange acting on adjacencies 
(1 2) < (6 6) < (3 8) < (8 7) consists in swapping the intervals [2,6] and 
[8,8]. 

(0 11^232456^673^ 84958^79 0) 

i_ 
(o 118495867323245679 o) 

A double- cut- and- join (DCJ) operation on a genome G cuts two different 
adjacencies in G and glues pairs of the four exposed extremities to form two new 
adjacencies. Here, we focus on two types of DCJ operations called excision and 
integration. 

An excision is a DCJ operation acting on a single chromosome by extracting 
an interval from it, making this interval a circular chromosome, and making 
the remainder a single chromosome (1 join). For example, the following excision 
extracts the circular chromosome (2 3 4): 

(ol^2 3 4^5 6o)^(2 3 4)(ol 5 6 o) 



An integration is the inverse of an excision; it is a DCJ operation that acts 
on two chromosomes, one being a circular chromosome, to produce a single chro- 
mosome. For example, the following operation is an integration of the circular 
chromosome (2 3 4): 

(2^3 4)(ol 5 6ao)^(o1 5 6 3 4 2o) 

We now give an obvious, but very useful, property linking BI operations to 
DCJ operations. 

Property 1. A single BI operation on a linear chromosome is equivalent to two 
DCJ operations: an excision followed by an integration. 

Proof. Let (o 1 [/ 2 V 3 o) be a genome, U and V the two intervals that 
are to be swapped by a block interchange operation, 1 2 and 3 the intervals 
constituting the rest of the genome (note that each of them may be empty). 

The first DCJ operation is the excision that produces the adjacency (1 V) 
by extracting and circularizing the interval [C/ ; 2]: 

{ol^U2t.V?,o)^{olVi o){U 2 ) 

The second DCJ operation is the integration that produces the adjacency 
{U 3) by reintegrating the circular chromosome {U 2) in the appropriate way: 

(o 1 V ^i o){U 2 a) ^ (o 1 F 2 [/ 3 o) 



A rearrangement scenario between two genomes A and _B is a sequence of 
rearrangement operations allowing to transform A into B. 

Definition 4. A BI (resp. DCJ) scenario is a rearrangement scenario composed 
of BI (resp. DCJ) operations. 

The length of a rearrangement scenario is the number of rearrangement op- 
erations composing the scenario. 

Definition 5. The BI (resp. DCJ) distance between two genomes A and B, 
denoted by dBi{A,B) (resp. d£)c.]{A,B)), is the minimal length of a BI (resp. 
DCJ) scenario between A and B . 

Genome Halving 

We now state the genome halving problem considered in this paper. 

Definition 6. Given a rearranged duplicated genome G composed of a single lin- 
ear chromosome, the BI halving problem consists in finding a tandem- duplicated 
genome H such that the BI distance between G and H is minimal. 



In order to solve the BI halving problem, we use some results on the DCJ 
halving problem that were stated in [6] as a starting point. Unlike the BI halving 
problem, the aim of the DCJ halving problem is to find a perfectly duplicated 
genome instead of a tandem-duplicated genome. 

Definition 7 ([6]). Given a rearranged duplicated genome G, the DCJ genome 
halving problem consists in finding a perfectly duplicated genome H such that 
the DGJ distance between G and H is minimal. 

The BI and DCJ genome halving problems lead to two definitions of halving 
distances: the BI halving distance (resp. DCJ halving distance) of a rearranged 
duplicated genome G is the minimum BI (resp. DCJ) distance between G and 
any tandem-duplicated genome (resp. any perfectly duplicated genome) ; we de- 
note it by d^gj{G) (resp. (F^uj{G)). 



3 Lowerbound for the BI halving distance 

In this section we give a lowerbound on the BI halving distance of a rearranged 
duplicated genome. We use a data structure representing the genome called the 
natural graph introduced in [6] . 

Definition 8. The natural graph of a rearranged duplicated genome G, denoted 
by NG{G), is the graph whose vertices are the adjacencies of G, and for any 
marker u there is one edge between {u v) and {u w), and one edge between 
{x u) and [y u). 

Note that the number of edges in the natural graph of a genome G containing 
n distinct markers, each one present in two copies, is always 2n. Moreover, since 
every vertex has degree one or two, then the natural graph consists only of cycles 
and paths. For example, the natural graph of genome G = (o 12143432 o) 
is depicted in Fig. 1. 




Fig. 1. The natural graph of genome G - 
of one path and two cycles. 



12 1 4 3 4 3 2 o) ; it is composed 



Definition 9. Given an integer k, a fc— cycle (resp. /c— pathj in the natural 
graph of a rearranged duplicated genome is a cycle (resp. path) that contains k 
edges. If k is even, the cycle (resp. path) is called even, and odd otherwise. 



Based on the natural graph, a formula for the DCJ halving distance was 
given in [6]. Given a rearranged duplicated genome G such that the number of 
even cycles and the number of odd paths in NG(G) are respectively denoted by 
EC and OP, the DCJ halving distance of G is: 



d'hcAG) = n - EC 



OP 



In the case of the BI halving distance, some peculiar properties of the natural 
graph need to be stated, allowing to simplify the formula of the DCJ halving 
distance, and leading to a lowerbound on the BI halving distance. 

In the following properties, we assume that G is a genome composed of a 
single linear chromosome containing n distinct markers, each one present in two 
copies in G. 

Property 2. The natural graph NG(G) contains only even cycles and paths: 

1. All cycles in the natural graph NG(G) are even. 

2. The natural graph NG(G) contains only one path, and this path is even. 

Proof. First, if (a x) is a vertex of the graph that belongs to a cycle G, then 
there exists an edge between (a x) and a vertex (a y). These two adjacencies 
are the only two containing a copy of the marker a at the first position. So, if we 
consider the set of all the first markers in all adjacencies contained in the cycle 
G, then each marker in this set is present exactly twice. Therefore, the cycle G 
is an even cycle. 

Secondly, the graph contains exactly two vertices (adjacencies) containing 
the marker o which are both necessarily ends of a path in NG(G). Thus there can 
be only one path in the graph. Since the number of edges in the graph is even 
and all cycles are even, then the single path is also even. D 

We now give a lowerbound on the minimum length of DCJ scenario trans- 
forming G into a tandem-duplicated genome. 

Lemma 1. Let d]jQj{G) be the minimum DCJ distance between G and any 
tandem- duplicated genome. If NG{G) contains C cycles then a lowerbound on 
d*jyQj{G) is given by: 

d'ocAG) >n-G-l 



Proof. First, since all cycles of NG(G) are even and NG(G) contains no odd path, 
then, from the DCJ halving distance formula, the DCJ halving distance of G is 
d^DcAG) = n-C. 

Now, since any tandem-duplicated genome can be transformed into a per- 
fectly duplicated genome with one DCJ, then d^j^^j + 1 > d'^c.i- Therefore, we 
have d]jQ J > d^^ j — l>n — C— 1. D 

We are now ready to state a lowerbound on the BI halving distance of a 
rearranged duplicated genome G. 



Theorem 1. // NG{G) contains C cycles, then a lowerbound on the BI halving 
distance is given by: 

Proof. We denote by i{S) the length of a rearrangement scenario S. Let Sbi be 
a BI scenario transforming G into a tandem-duphcated genome. From property 
1, we have that Sbi is equivalent to a DCJ scenario Sdcj such that £{Sdcj) = 
2*£{Sbi). Now, suppose that ^(5b/) < L^J, then {{Sbi) < L^J " 1 < 
p-£^] _ 1. 

This implies iiSocj) < 2[ ""g'^ ] ~ 2 < n~^ C - 2 < n- C - 1. Thus, from 
Lemma 1 we have ^(Sbcj) < ^^dcj which contradicts the fact that d^j-,Qj is 
the minimal number of DCJ operations required to transform G into a tandem- 
duplicated genome. 

In conclusion, we always have d*gj{G) > [^^^^^J. D 

4 Formula for the BI halving distance 

In this section, we show that the BI halving distance of a rearranged duplicated 
genome G with n distinct markers such that NG(G') contains C cycles is exactly: 

G 



d%j{G) -- 

In other words, we show that enforcing the constraint that 2 consecutive 
DCJ have to be equivalent to a BI doesn't change the distance (even though it 
obviously restricts the DCJ that can be performed at each step of the scenario) . 

In the following, G denotes a rearranged duplicated genome G constisting in a 
single linear chromosome with n distinct markers after the reduction process, and 
such that NG(G) contains C cycles. We begin by recalling some useful definitions 
and properties of the DCJ operations that allow to decrease the DCJ halving 
distance by 1 in the resulting genome. 

Definition 10. A DCJ operation on G producing genome G' is sorting if it 
decreases the DCJ halving distance by 1: d^ijj{G') — d^(jj(G) — l = n — C— 1. 

Since the number of distinct markers G" is n and d'^^j{G') = n — C — 1, 
then NG(G') contains G + 1 cycles. In other words, a DCJ operation is sorting if 
it increases the number of cycles in NG(G) by 1. 

Given (u v) an adjacency of G that is not a double- adjacency, we denote by 
DCJ{u v) the DCJ operation that cuts adjacencies {u x) and (y v) to form 
adjacencies {u v) and (y x), making (u v) a double- adjacency. 

Property 3. Let (u v) be an adjacency of G that is not a double-adjacency, 
DCJ{u v) is a sorting DCJ operation. 

Proof. DCJ{u v) increases the number of cycles in NG(G) by 1, by creating a 
new cycle composed of adjacencies {u v) and {u v). O 
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Fig. 2. X{G) = { ]2 ; 1[ , [2 ; 1] , ]2 ; 3[ , [1 ; 3] , ]1 ; 3[ }, the set of inter- 
vals of G = (o 2 1 2 3 1 3 o) depicted as boxes. The two boxes with thick lines 
represent two overlapping intervals of 1(G) inducing a Bl which exchanges 2 and 3. 



Definition 11. Let [u v), [u x), and (y v) be adjacencies of G. The interval 
oj the adjacency [u v), denoted by I{u v) is either: 

— the interval \x ; y\ if {u x) < (y v). In this case, we denote it by ]u ; v[, or 

— the interval \v ; u] if (y v) < (u x). 

For example, the intervals of the adjacencies in genome (o 2 1 2 3 1 3 o) 
are depicted in Fig 2. Note that, given an adjacency {u v) of G, if (u v) is a 
double- adjacency then the interval I(u v) is empty, otherwise DCJ{u v) is the 
excision operation that extracts the interval I{u v) to make it circular, thus 
producing the adjacency {u v). 

Two intervals I{a b) and I{x y) are said overlapping if their intersection 
is non-empty, and none of the intervals is included in the other. It is easy to 
see, following Property 1, that given two adjacencies (a b) and [x y) of G such 
that /(a b) and I{x y) are non-empty intervals, the successive application of 
DCJ(a b) and DCJ{x y) is equivalent to a BI operation if and only if I{a b) 
and I{x y) are overlapping. Note that in this case neither (a 6), nor (x y) can 
be double- adjacencies in G since their intervals are non-empty. Figure 2 shows 
an example of two overlapping intervals. 

The following property states precisely in which case the successive appli- 
cation of DCJ{a b) and DCJ{x y) decreases the DCJ halving distance by 2, 
meaning that both DCJ operations are sorting. 

Property 4- Given two adjacencies (a 6) and (x y) of G, such that I{a b) and 
I{x y) are overlapping, the successive application of DCJ(a b) and DCJ{x y) 
decreases the DCJ halving distance by 2 if and only ii x y^a and y y^b- 

Proof, li X ^ a and y y^ b, then the successive application of DCJ{a b) and 
DCJ{x y) increases the number of cycles in NG(G) by 2, by creating two new 
2-cycles. Otherwise, DCJ{a b) first creates a new cycle that is then destroyed 
byDCJ(a; y). D 



We denote by I{G), the set of intervals of all the adjacencies of G that do 
not contain marker o. 

Remark 1. Note that, if G contains n distinct markers, then there are 2n — 1 
adjacencies in G that do not contain marker o, defining 2n — 1 intervals in X{G). 



Definition 12. Two intervals I {a b) and I{x y) ofX{G) are said compatible 
if they are overlapping and x ^a and y ^b. 

In the following, we prove the BI halving distance formula by showing that 
if genome G contains more than three distinct markers, n > 3, then there exist 
two compatible intervals in I{G), and if n = 2 or n = 3 then d*gj{G) — 1 and 
2 < d^DCji^) — '^- This means that there exists a BI halving scenario S such 
that all BI operations in S, possibly excluding the last one, are equivalent to 
two successive sorting DCJ operations. 

From now on, until the end of the section, (a b) is an adjacency of G that is 
not a double-adjacency, A is a genome consisting in a linear chromosome £ and 
a circular chromosome £, obtained by applying the sorting DCJ, DCJ{a 6), on 
G. 

If there exists an interval I{x y) in I{G) compatible with I {a b), then 
applying DCJ{x y) on A consists in the integration of the circular chromosome 
£ into the linear chromosome £ such that the adjacency {x y) is formed. Such 
an integration can only be performed by cutting an adjacency {x u) in € and 
an adjacency (w y) in £ (or inversely) to produce adjacencies (x y) and {v u). 
This means that there must be an adjacency {x y) in either C or £ such that x 
is in € and y in £ or inversely. Hence, we have the following property : 

Property 5. £ cannot be reintegrated into £ by applying a sorting DCJ, DCJ{x y), 
on A if and only if either: 

(1) for any adjacency [x y) in £ (rcsp. £), markers x and y are in £ (resp. £), 
or 

(2) for any adjacency (x y) in £ (rcsp. £), markers x and y are also in £ (resp. 

£). 

Proof. If there exists no adjacency {x y) in A such that 3; is in £ and y in £ or 
inversely, then A necessarily satisfies either (1), or (2). D 

Definition 13. An interval I(a b) in 1(G) is called interval of type 1 (resp. 
interval of type 2) if DCJ{a b) produces a genome A satisfying configuration (1) 
(resp. configuration [2)) described in Property 5. 

For example, in genome (o 2 113 2 3 o), /(I 3) is of type 1 as DCJ(1 3) 
produces genome (o 2 1 3 o) (1 3 2) ; /(2 3) is of type 2 as DCJ{2 3) 
produces genome (o 2 3 2 3 o) (1 1). 

Now we give the maximum numbers of intervals of type 1 and type 2 that 
can be contained in genome G. 



Lemma 2. The maximum numher of intervals of type 1 in I{G) is 2. 

Proof. First, note that there cannot be two intervals / and J of I{G) such 
that I ^ J ^ and both / and J are of type 1. Now, if / is an interval of type 
1, there can be at most two different adjacencies {x y) and [u v) such that 
I{x y) = I{u v) = I. In this case G necessarily has a chromosome of the form 
(. . . XV ... u y . . .) or (. . . u y ... x v . . .). Therefore, there are at 
most two intervals of type 1 in 1(G) . D 

Lemma 3. The maximum, number of intervals of type 2 in X(G) is n. 

Proof. First, note that for two adjacencies (x y) and {x z) in G that do not 
contain marker o, if (x y) is of type 2 then (x z) cannot be of type 2. Now, 
there is only one marker u such that {u o) is an adjacency of G. Let [u v) be 
the adjacency of G having u as first marker, then at most half of the intervals 
in I{G) — {I{u v)} can be of type 2. Therefore, there are at most n intervals of 
type 2 in I{G). D 

Theorem 2. // NG(G) contains C cycles, then the BI halving distance of G is 
given by: 



Proof. Since there are 2n — 1 intervals in I{G), and at most n + 2 arc of type 1 
or 2, then if G is a genome containing more than three distinct markers n > 3, 
then 2n — 1 > n + 2 and there exist two compatible intervals in X(G) inducing 
a BI operation that decreases the DC J distance by 2. 

Next, we show that if n = 2 orn = 3, then d*^j{G) — 1 and 2 < d^(jj{G) < 3. 

If n = 2, then the genome can be written, either as (o a 6 6 a o), in which case 
a BI can swap a and b to produce a tandem-duplicated genome, or as (o a a 6 6 o), 
in which case a BI can swap a and a 6 to produce a tandem-duplicated genome. 

If n = 3, then the genome has two double-adjacencies to be constructed, 
of the form (a 6), (x y), with (a b) and (x y) being two adjacencies already 
present in the genome such that b ^ x or b ~x and a and y are distinct markers. 
One can rewrite (a b) and (x y) as single markers since they will not be splitted, 
which makes a genome with 4 markers such that at most 2 are misplaced. Then, 
a single BI can produce a tandem-duplicated genome. 

Now, it is easy to see to see that if n = 2 or n = 3, then d^^ j{G) = n — G < 
3. Finally, if n = 2 or n = 3, then d^jj(-,j{G) > 2, otherwise we would have 
d^Qj{G) — 1 which would imply, as G consists in a single linear chromosome, 
d*gj{G) = 0. In conclusion, if n > 3 then there exist two compatible intervals 
in 1(G), otherwise if n = 2 or n = 3, then d*gj{G) = 1 and 2 < d^^(j j{G) < 3. 

Therefore d^^i = L%^J = L^J- ^ 



5 Sorting algorithm 

In Section 4, we showed that if a genome G contains more than three distinct 
markers after reduction then there exist two compatible intervals in I{G) induc- 
ing a BI to perform. If G contains two or three distinct markers then the BI to 
perform can be trivially computed. Thus the main concern of this section is to 
describe an efficient algorithm for finding compatible intervals when n > 3. 

As in Section 4, in the following, G denotes a genome consisting of n distinct 
markers after reduction. It is easy to show that the set of intervals I{G) can be 
built in 0(n) time and space complexity. 

We now show that finding 2 compatible intervals in I{G) can be done in 
0{n) time and space complexity. 

Property 6. li n > 3 , then all the smallest intervals in I{G) that are not of 
type 2 admit compatible intervals. 

Proof. Let J be a smallest interval that is not of type 2 in I{G). As J is not of 
type 2, then J has compatible intervals if J is not of type 1. 

Let us suppose that J is of type 1, then for any adjacency (a h) such that 
markers a and b are not in J, o and h are in J, and then I{a b) is strictly included 
in J and I{a b) can't be of type 2. Such adjacency does exist as there are n > 3 
markers not included in J. Therefore J cannot be a smallest interval that is not 
of type 2. D 

We are now ready to give the algorithm for sorting a duplicated genome G 
into a tandem-duplicated genome with [ "~'^ J BI operations. 



Algorithm 1 Reconstruction of a tandem-duplicated genome 



1 
2 
3 

4 
5 
6 
7 
8 
9 
10 



while G contains more than 3 markers do 
Construct I{G) 

Pick a smallest interval I{a b) that is not of type 2 in I{G) 
Find an interval I{x y) in 1(G) compatible with I{a b) 
Perform the BI equivalent to DCJ{a b) followed by DCJ{x y) 
Reduce G 

end while 

if G contains 2 or 3 markers then 

Find the last BI operation and perform it 

end if 



Theorem 3. Algorithm 1 reconstruct a tandem- duplicated genome with a BI 
scenario of length [ "~ J in 0{n^) time and space complexity. 

Proof. Building 1(G) and finding two compatible intervals can be done in 0(n) 
time and space complexity. It follows that the while loop in the algorithm can 
be computed in 0{n^) time and space complexity. 



Finding and performing the last BI operation when 2 < n < 3 can be done 
in constant time and space complexity. 

Moreover, all BI operations, possibly excluding the last one, are computed 
as pairs of sorting DCJ operations, which ensures that the length of the scenario 

is L^J. D 

6 Conclusion 

In this paper, we introduced the BI halving problem. We use the DCJ model 
to simulate BI operations and we showed that it is always possible to choose 
two consecutive sorting DCJ operations such that they are equivalent to a BI 
operation. We thus provide a quadratic time and space algorithm to obtain a 
most parsimonious scenario as any computed BI scenario is in fact an optimal 
DCJ scenario. Finally, one direction for further studies of variants of the BI 
halving problem is to consider multichromosomal genomes and BI operations 
acting on more than one chromosome. 
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